Performance Optimization on GPGPU & Multicore CPU Using Roofline Model
نویسندگان
چکیده
منابع مشابه
Roofline: An Insightful Visual Performance Model for Floating-Point Programs and Multicore Architectures
We propose an easy-to-understand, visual performance model that offers insights to programmers and architects on improving parallel software and hardware for floating point computations.
متن کاملMulticore Performance Optimization Using Partner Cores
As the push for parallelism continues to increase the number of cores on a chip, and add to the complexity of system design, the task of optimizing performance at the application level becomes nearly impossible for the programmer. Much effort has been spent on developing techniques for optimizing performance at runtime, but many techniques for modern processors employ the use of speculative thr...
متن کاملHigh-Performance Numerical Optimization on Multicore Clusters
This paper presents a software infrastructure for high performance numerical optimization on clusters of multicore systems. At the core, a runtime system implements a programming and execution environment for irregular and adaptive task-based parallelism. Building on this, we extract and exploit the parallelism of a global optimization application at multiple levels, which include Hessian calcu...
متن کاملStudy of Performance Issues on a SMP-NUMA System using the Roofline Model
This work presents a performance model based on a combination of hardware counters and a Roofline Model developed to characterize the behavior of the FINISTERRAE supercomputer, one of the largest SMP-NUMA systems. Our main objective is to provide an insightful model which allows to determine, at a glance, performance issues related to thread and memory allocation of irregular codes in this mach...
متن کاملReal-Time Image Processing Applications on Multicore CPUs and GPGPU
This paper presents real-time image processing applications using multicore and multiprocessing technologies. To this end, parallel image segmentation was performed on many images covering the entire surface of the same metallic and cylindrical moving objects. Experiments with multicore CPUs showed that by increasing the chunk size, the execution time decreases approximately four times in compa...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: IOP Conference Series: Materials Science and Engineering
سال: 2021
ISSN: 1757-8981,1757-899X
DOI: 10.1088/1757-899x/1152/1/012021